The Galapagos giant tortoise Chelonoidis phantasticus is not extinct

The status of the Fernandina Island Galapagos giant tortoise (Chelonoidis phantasticus) has been a mystery, with the species known from a single specimen collected in 1906. The discovery in 2019 of a female tortoise living on the island provided the opportunity to determine if the species lives on. By sequencing the genomes of both individuals and comparing them to all living species of Galapagos giant tortoises, here we show that the two known Fernandina tortoises are from the same lineage and distinct from all others. The whole genome phylogeny groups the Fernandina individuals within a monophyletic group containing all species with a saddleback carapace morphology and one semi-saddleback species. This grouping of the saddleback species is contrary to mitochondrial DNA phylogenies, which place the saddleback species across several clades. These results imply the continued existence of lineage long considered extinct, with a current known population size of a single individual.

2 the Fern 1906 specimen is exceptionally well preserved, with over 99% of reads originating from endogenous DNA. However, the DNA itself was still highly fragmented however, with an average length of 119 bp, as visualized on a bioanalyzer. When the sequences obtained were trimmed and overlapping forward and reverse reads were collapsed (92% of reads collapsed), there was a mean read length of 176 bp. So, although we observe that there is a high proportion of endogenous DNA, that DNA is still degraded, as expected with the age of the specimen.
Additionally, the MapDamage analysis indicated very low rates of DNA damage, with a minimal increase in base misincorporation of C->T overall, and no trend in the misincorporation rate relative to position along a read.
After filtering the SNP variants for minor allele count of 1, allowing no missing data, having a maximum mean depth cut off 1 SD above the mean (here equal to 31.3x) and pruning for LD, a total of 751,800 SNP loci were retained for use in the PCA analysis.
Nuclear genome phylogenetic trees Because consensus species tree construction can be affected by the number of gene trees, the length of sequence used to create those trees, and non-independence (i.e., linkage) between trees, we created four datasets of genomic segments of different lengths (10kb or 100kb) and separated by different distances (100kb or 1Mb). The number of gene trees in a data set ranged from 306 (100kb segments separated by 1Mb) to 7212 (10kb segments separated by 100kb). We recovered nearly identical topologies across datasets (Figs. 2B, S1-3), although node support was lower in consensus trees created from datasets with fewer gene trees (e.g., Fig. S2-3).

3
The non-monophyly and poor node support for the Fernandina tortoises when analyzed with the 13 other species was a surprise given the close genetic relationship between these two tortoises and their shared island of origin. Given the low support for other nodes across the tree, we suspected that the recent radiation of these species had led to low sequence divergence and high rates of incomplete lineage sorting (ILS) among species, leading to a high incongruence among gene trees. However, given that both the consensus tree and the carapace morphology of Galapagos giant tortoises support the monophyly of saddleback tortoises, we created four new datasets, using the same filtering parameters described above, but this time only including sequences from Fernanda, the species with saddleback morphology (i.e., the species from San Cristóbal, Pinzón, Española, Pinta, and Fernandina), and the C. chilensis outgroup. The consensus trees produced from these four saddleback datasets showed highly congruent topologies (Fig. 2C, Figs. S4-6). Notably, all described species were highly supported monophyletic groups, including the two tortoises from Fernandina. Again, we observe that consensus trees made from fewer and shorter segments have lower node support ( Fig. S4) compared to those made from more segments (Fig. S5) or longer segments (Fig. 2C).
None of the nuclear trees place the individuals of C. becki into a monophyletic clade. This species, found on Volcano Wolf at the northern end of Isabela Island, is known to consist of two lineages (referred to as PBL and PBR) originating from different colonization events 1 , yet our nuclear phylogenies find the PBL lineage itself to be split across major clades in the tree. This placement may have been influenced by recent gene flow into C. becki as a result of humans transporting tortoises or represent historical admixture. 4

Mitogenome phylogenetic trees
The best-fit partitioning scheme for each downstream analysis, and the selected nucleotide substitution models are given in Table S3. The ML and BI analyses resulted in phylogenetic trees with lnL= -27,213.56 and lnL= -26,919.10 (harmonic mean), respectively. All MCMC diagnostic metrics indicated that the iterations of BI analysis reached convergence and stationarity. The average standard deviation of split frequencies was smaller than 0.01, the plot of generation versus log-likelihood of the data had characteristic "white-noise" morphology after burn-in. The PSRF values were near 1.00 (range 0.999 -1.000) and the minimum ESS values were well over 100 (the minimum value was 1,072.40).
Tree topologies from the BI and ML analyses were identical ( There are important points of discordance between the nuclear phylogenies and that based on the mitochondrial genome. Most noticeably, the saddleback species form a monophyletic clade in the nuclear trees, which is in discordance with trees based on the mitochondrial genome, where they are dispersed across the tree (see Supplementary Text, Fig. S7, 2 ). Additionally, the two species from Santa Cruz Island are sister taxa on the nuclear trees, whereas they are in different clades in the mitochondrial tree. These findings have important implications for our 5 understanding of how the saddleback morphology evolved in Galapagos tortoises, and the phylogeographic history of the radiation.

Explorations of heterozygosity estimates
The average depth of sequencing coverage varied from 9.5X to 34X across our samples (Supplemental File 2). Because estimates of heterozygosity can be affected by coverage, we used ANGSD 3 to down-sample each BAM file in our sample set to an average of 9.5X coverage with option -downSample C, where C is 9.5 divided by the sample coverage. We then re-calculated genome-wide heterozygosity in ANGSD, using the 1 sample SFS Estimation method.
We found that in our sample set, there was no correlation between depth of coverage and heterozygosity estimate using either VCFtools or ANGSD (Supplemental File 2). Furthermore, although the estimates from ANGSD and VCFtools were different, the estimates were highly correlated. Finally, we found that down-sampling the BAMS to 9.5X led to heterozygosity being underestimated by an average of 4.6%, which is small compared to the difference seen between down-sampled individuals. For example, the difference in heterozygosity between Fernanda and other tortoises ranges from 7-200%. These results suggest that qualitative comparisons of heterozygosity are reliable in our sample set, even when depth of coverage differs among samples.
In addition, transitions can occur at artificially higher rates in ancient samples due to cytosine deamination in ancient DNA samples. Because sample CAS8101 is a museum specimen, we reestimated heterozygosity by removing transitions in ANGSD (-noTrans 1). The estimate of heterozygosity for CAS8101 was marginally reduced relative to modern samples (Supplemental 6 File 2). However, it was still the sample with the second highest genome-wide heterozygosity, even by this measure.   Tables   Table S1.

Supplementary References
Mean autosomal heterozygosity estimated using VCFtools within each species of Galapagos giant tortoise and the two individuals from Fernandina Island.